Method and apparatus for diagnostic recording using transactional memory

ABSTRACT

A method ( 500 ) or a diagnostic recording device ( 400 ) having transactional memory and a processor coupled to the transactional memory can store ( 502 ) contents of a transaction log ( 40 ) of the transactional memory, detect ( 504 ) an exception event, and replay ( 506 ) last instructions that led up to the exception event using a debugger tool ( 80 ). The transactional memory can be hardware or software based transactional memory. The processor can also store the transaction log by storing the contents of the transaction log in a core file ( 302 ) which can include a stack ( 60 ), a register dump ( 70 ), a memory dump ( 75 ), and the transactional log. The debugger tool can be used to load up the core file, an executable file ( 95 ), and a library ( 90 ) to enable the diagnostic recording device to retrace transactions occurring at the diagnostic recording device up to the exception event.

FIELD OF THE INVENTION

The present invention relates to the field of diagnostic recording and, more particularly, to a method and apparatus for diagnostic recording using transactional memory.

BACKGROUND

The ability to collect data about a software application's execution as it runs is often referred to as a “flight recorder.” The cost of a software application flight recorder, though, is frequently thought of as being cost prohibitive, involving a significant amount of overhead. More specifically, a flight recorder in software is typically built as an in-memory circular buffer that stores frequent activity with very little overhead. Even though there is very little overhead, tracking every read/write into such a buffer would, despite being very useful when a problem occurs, be too expensive and would likely reduce performance by 50% or more. There is usually no “log” and no “transactions” when using a flight recorder. A flight recorder is very primitive. The buffer could be dumped out on demand or when a problem occurs. Building a flight recorder into hardware for the sole purposes of diagnostics is unfortunately not a high enough justification of overhead and hardware enhancements.

However, the flight recorder does allow, when there is a problem, the trace file to be played back for root cause analysis. Thus, there is often no need to reproduce the problem. In many causes, the source of the problem can be determined from a single occurrence. Transactional memory has been used for other purposes, but not for diagnostic recording.

SUMMARY OF THE INVENTION

Hardware-based transactional memory can provide a very useful flight recorder with very little additional investment. Embodiments herein can provide a mechanism to handle a memory exception (or trap) and treat this as similar to a violation. Transactional memory can provide a means to access a memory transaction log for a thread and dump such information as needed to serve as a flight recorder of recent activity. The transaction log can be stored in a core file or other file, and the steps leading up to the exception event or trap can be replayed post-mortem inside the debugger.

The embodiments of the present invention can be implemented in accordance with numerous aspects consistent with the material presented herein. For example, one aspect of the present invention can include diagnostic recording method using (software or hardware-based) transactional memory including the steps of storing a transaction log of the transactional memory (of most recent memory accesses for example), detecting an exception event, and replaying last instructions that led up to the exception event using a debugger tool. The step of storing the transaction log can include storing the transaction log in a core file.

Another aspect of the present invention can include a diagnostic recording device such as a diagnostic flight recorder having transactional memory and a processor coupled to the transactional memory (of most recent memory accesses for example). The processor can be operable to store contents of a transaction log of the transactional memory, detect an exception event, and replay last instructions that led up to the exception event using a debugger tool. As noted above, the transactional memory can be hardware-based transactional memory or software-based transactional memory. The processor can also be operable to store the contents of the transaction log by storing contents of the transaction log in a file such as a core file. The core file can include a stack, a register dump, a memory dump, and the transactional log. The processor can be further operable to make a special system call into an operating system to provide call thread access to its own transaction log. The processor can be further operable to cause the debugger tool to load up the core file, an executable file, and a library to enable the diagnostic recording device to retrace transactions occurring at the diagnostic recording device up to the exception event. The exception event is also known as a trap or trap condition, fault, program check, or violation.

It should be noted that various aspects of the invention can be implemented as a program or a computer implemented method for controlling computing equipment to implement the functions described herein, or a program for enabling computing equipment to perform processes corresponding to the steps disclosed herein. This program may be provided by storing the program in a magnetic disk, an optical disk, a semiconductor memory, any other recording medium, or can also be provided as a digitally encoded signal conveyed via a carrier wave. The described program can be a single program or can be implemented as multiple subprograms, each of which interact within a single computing device or interact in a distributed fashion across a network space.

BRIEF DESCRIPTION OF THE DRAWINGS

There are shown in the drawings, embodiments which are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.

FIG. 1 is a linear representation of instructions executed on a processing unit by a thread in accordance with an embodiment of the present invention.

FIG. 2 is an illustration of an exception event or trap condition in accordance with an embodiment of the present invention.

FIG. 3 is an illustration of the handling of the exception event or trap condition by dumping information in accordance with an embodiment of the present invention.

FIG. 4 is a block diagram of the elements used for debugging transactional memory in accordance with an embodiment of the present invention.

FIG. 5 is a flow chart illustrating a method of diagnostic recording using transactional memory in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments herein can include (but is not limited to) a hardware based transactional memory technology that keeps a log of recent memory accesses in a hardware buffer. When a hardware exception (trap, fault, violation, etc) occurs, this log can be dumped to a file (or as part of a file such as a core file) and used with a debugger for post mortem analysis. The last instructions that lead up to exception would be recorded and replayed with a debugger using the transaction log as a reference. Note, replaying can be done by a tool or by a human using a debugger or a debugger interface.

The idea is an extension of proposed hardware-based transactional memory models. The log that is required for hardware-based transaction memory can be dumped when a failure occurs and used for debugging purposes (e.g. via a debugger) to replay the last instructions that led up to the exception. The cause, although often obfuscated by complexity and overwritten data, can be captured between the transaction log and the virtual memory image for the process or operating system.

Referring to FIG. 1, each square represents one instruction 10 that is run on a CPU for a thread 100. The instructions denoted as 10 are normal instructions, while an instruction 20 is an open transaction instruction and an instruction 30 is an end-transaction instruction. Inside of a transaction, the updates and reads to memory will be tracked in a transaction log 40 that can be hardware based (or software based if a software based transactional memory model is used). Assuming that hardware-based transactional memory becomes an industry standard, programs will run normally with these in-memory transaction logs. The purpose is to identify a break-down of memory access integrity between threads, but the content of transaction logs which will be in memory anyway can also be used for diagnostics as contemplated herein. As contemplated herein, there is no need to turn on the transaction log since the transaction memory can operate as a flight recorder that will be on all of the time (although its primary use would be to support transactional memory).

Referring to the thread 200 of FIG. 2, an instruction 50 represents a trap condition. On certain operating systems, such as zOS®, this is called a program check. On other platforms, this is known as a trap, fault, or exception event or condition. The issue is that “something bad happened” that has been noticed by the hardware, such as an invalid memory access. At this point, a trap, signal, fault, exception or program check handler is called. The purpose of this handler is to dump information about the exception event (used as the term from now on) such as a stack trace, registers, or other portions of a core file. In addition to the other types of information, embodiments herein can also include a special system call (a call into the OS) that would give the calling thread access to its own transaction log. The OS can also dump the information along with a core file or similar concept.

As illustrated in FIG. 3, the exception event or trap represented by instruction 50 triggers the trap handling 304 where the dumped information can be a core file 302 that can contain (among other things) a stack 60, a register dump 70, a memory dump 75 and the transaction log or flight recorder 40. FIG. 4 further illustrates a debugger 80 of any kind, the data from the trap condition (including the register dump 70, the transaction log 40, and the stack 60) and a set of libraries90 and an executable 95. The debugger 80 would load up the executable 95, library 90 and core file 302 (that contains the stack and register dump and memory). The debugger 80 would also have access to the flight-recorder-like transaction log 40. Using this log (originally deemed for hardware-based transactional memory), the debugger 80 can retrace or walk backward in time looking at the impact of each instruction and undo the impact to the real memory that the instruction had (if actually written to memory). Such an arrangement would allow an investigator (tool or human) to replay the transaction up to the point of the trap or exception event. A human or tool (particularly a tool with a debugger interface) can use the debugger or a debugging interface to retrace. Of course, a debugger can be arranged to facilitate such retracing as contemplated herein as a potential embodiment.

Referring to FIG. 5, a flow chart illustrating a diagnostic recording method 500 using (software or hardware based) transactional memory including the step 502 of storing contents of a transaction log of the transactional memory (of most recent memory accesses for example), detecting an exception event at step 504, and replaying last instructions that led up to the exception event using a debugger tool at step 506. The step of storing the contents of the transaction log can include storing the contents of the transaction log in file such as a core file.

The present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.

The present invention also may be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.

This invention may be embodied in other forms without departing from the spirit or essential attributes thereof. Accordingly, reference should be made to the following claims, rather than to the foregoing specification, as indicating the scope of the invention. 

1. A diagnostic recording method, comprising the steps of: storing contents of a transaction log of a transactional memory; detecting an exception event; and replaying last instructions that led up to the exception event using a debugger tool.
 2. The method of claim 1, wherein the step of storing the contents of the transaction log comprises storing the contents of the transaction log of most recent memory accesses.
 3. The method of claim 1, wherein the transactional memory is a hardware-based transactional memory.
 4. The method of claim 1, wherein the transactional memory is a software-based transactional memory.
 5. The method of claim 1, wherein the step of storing the contents of the transaction log comprises storing contents of the transaction log in a core file.
 6. The method claim 1, wherein the transaction memory is used as a flight recorder.
 7. A computer program embodied in a computer storage medium and operable in a data processing machine for diagnostic recording using transactional memory, comprising instructions executable by the data processing machine that cause the data processing machine to: store contents of a transaction log of the transactional memory; detect an exception event; and replay last instructions that led up to the exception event using a debugger tool.
 8. The computer program of claim 7, wherein the computer program further include instructions to cause the data machine to store the contents of the transaction log of most recent memory accesses.
 9. The computer program of claim 8, wherein the computer program includes instructions to cause the data machine to store the most recent memory accesses in a hardware-based transactional memory.
 10. The computer program of claim 8, wherein the computer program includes instructions to cause the data machine to store the most recent memory accesses in a software-based transactional memory.
 11. The computer program of claim 7, wherein the computer program includes instructions to cause the data machine to store the contents of the transaction log in a core file.
 12. A diagnostic recording device, comprising: transactional memory; a processor coupled to the transactional memory, wherein the processor is operable to: store contents of a transaction log of the transactional memory; detect an exception event; and replay last instructions that led up to the exception event using a debugger tool.
 13. The diagnostic recording device of claim 12, wherein the processor is operable to store the contents of the transaction log by storing the contents of the transaction log of most recent memory accesses.
 14. The diagnostic recording device of claim 12, wherein the transactional memory is a hardware-based transactional memory or software-based transactional memory.
 15. The diagnostic recording device of claim 12, wherein the processor is operable to store the contents of the transaction log by storing the contents of the transaction log in a core file.
 16. The diagnostic recording device of claim 12, wherein the diagnostic recording device is a diagnostic flight recorder.
 17. The diagnostic recording device of claim 12, wherein the processor is further operable to make a special system call into an operating system to provide call thread access to its own transaction log.
 18. The diagnostic recording device of claim 12, wherein the processor is operable to dump information from a core file comprising a stack, a register dump, a memory dump, and the transactional log.
 19. The diagnostic recording device of claim 18, wherein the processor is further operable to cause the debugger tool to load up the core file, an executable file, and a library to enable the diagnostic recording device to retrace transactions occurring at the diagnostic recording device up to the exception event. 